About Our Project

For our end of the semester project, we explored the capabilities of a Microsoft Kinect™. First of all, the specs of the Kinect are as follows:

Sensors/Actuators
Color and Depth-Sensing Cameras
Multi-Microphone Array
Tilt Motor
Field of View
Horizontal field of view: 57 degrees
Vertical field of view: 43 degrees
Physical tilt range: ± 27 degrees
Depth sensor range: 1.2m – 3.5m
Data Streams
320×240 16-bit depth @ 30 frames/sec
640×480 32-bit colour@ 30 frames/sec
16-bit audio @ 16 kHz
KINECT

To access the Kinect, we first needed the software to interface with the device. Luckily, OpenKinect was already built by an open source community of hackers that derived Microsoft's encoding of the Kinect data and commands through reverse engineering techniques. This library has notable wrappers in Python, C++, C#, Java, Javascript, and Lisp. So far, the main useful feature of OpenKinect is essential a low-level driver called 'libfreenect'. The functionality of this includes setting the LED different colors (3 different colors plus 2 blinking modes), setting the pitch (ie up/down) of the Kinect, and reading the data returned from the device. This data so far includes the RGB frames and the depth frame; the OpenKinect website states that full future support will be added to get data from the microphone array. For more information regarding the protocol and workings of the individual sensors, working Kinect projects and example code, or any other specific question regarding the Kinect specifically, please refer to the OpenKinect website.

To do anything interesting with the data returned by the Kinect, specifically the RGB and depth frames, we researched another library called OpenCV originally built by the open source branch of Intel (along with numerous other researches) but now primarily maintained by robotic entusiast company, Willow Garage. The code is hosted on a brand new site (specifically on 5/7/12) here while their main wiki and documentation is here. The wiki page is being redone at the time of writing so this link may be broken in the future. This library covers an extremely huge range of image processing tasks that could definitely not be covered in a single semester, so we chose one algorithm to analyze that was both feasible to learn about and get implemented quickly.

Basically, when 'tracking' is turned on in our program it records a history of N (2 in our case as the frames per second from the Kinect was lower than expected) images. The absolute differences between these pictures gets accumulated, segmented into connected components (aka the region of interest), and small components due to noise get thrown out. From this, it is then simle mathamatics to get the x and y location in the plane of the camera screen. From the motion history, it is also possible to determine the angle that the connected component is traveling in the plane of the camera screen, but this has slightly erratic behavior and should not be trusted for more than determining simple directions such as left vs right. Finally, knowing the region of interest (ROI), one can simple average over that region in the depth image to get a rough estimate for the depth of the object in interest. In fact, this program can be used to track multilple objects, but there would have to be algorithms to differentiate between the objects and to guard against the case where two or more distinct objects are close and are considered one object by the tracking algorithm.

2012 | Thanks to the OpenCV and OpenKinect Communities

CSS Layout by RamblingSoul.com